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Abstract 

This paper describes two separate efforts that used the 
SPIN model checker to verify deep space autonomy flight 
software. The first effort occurred at the beginning of a 
spiral development process and found five concurrency 
errors early in file design cycle that the developers ac- 
knowledge would not have been found through testing. 
This effort required a substantial manual modeling effort 
involving both abstraction and translation from the pro- 
totype LISP code to the PROMELA language used by 
SPIN. This experience and others led to research to ad- 
dress the gap between formal method tools and the de- 
velopment cycle used by software developers. The Java 
PatliFinder tool which directly translates from Java to 
PROMELA was developed as part of this research, as well 
as automatic abstraction tools. In 1999 the flight software 
flew on a space mission, and a deadlock occurred in a 
sibling subsystem to the one which was the focus of the 
first verification effort. A second quick-response “clean- 
room” verification effort found the concurrency error in a 
short amount of time. The error was isomorphic to one 
of the concurrency errors found during the first verifica- 
tion effort. The paper demonstrates that formal methods 
tools can find concurrency errors that indeed lead to loss 
of spacecraft functions, even for the complex software 
required for autonomy. Second, it describes progress in 
automatic translation and abstraction that eventually will 
enable formal methods tools to be inserted directly into 
the aerospace software development cycle. 


1 Introduction 

Complex concurrent software is difficult to debug and 
even more difficult to test with adequate coverage. With 
file increasing power of flight-qualified microprocessors, 
NASA space enterprises are experimenting with a new 
generation of non-deterministic flight software that pro- 
vides enhanced mission capabilities. A prime example is 
the Remote Agent (RA) autonomous spacecraft controller 
developed at NASA. In May 1999, the RA was success- 
fully demonstrated in flight on Deep Space 1 (DS-l), the 
first flight of NASA’s experimental New Millennium pro- 
gram. The RA is a complex, concurrent software system 
employing several automated reasoning engines using ar- 
tificial intelligence technology. The verification of such 
complex software is critical to its acceptance by science 
mission managers. 

This paper describes formal methods verification ef- 
forts for one of the three subsystems of the RA - specifi- 
cally, file RA Executive, which provides operating-system 
level capabilities for goal-directed software. Two differ- 
ent verification activities were conducted, before and af- 
ter flight, using different technologies and in very differ- 
ent contexts. As such, this paper provides two succes- 
sive snapshots of progress towards making formal meth- 
ods verification cost-effective. 

In 1997, while the RA was still in the development 
stage, we modeled and verified a subset of the core ser- 
vices of the RA Executive using the SPIN [10] model 
checker. That verification unveiled several concurrency 



bugs that were acknowledged by RA Executive develop- 
ers [7], 

As a result of this effort, it was decided to develop 
model checking technology for a mam stream program- 
ming language in order to reduce the amount of time spent 
on modeling the behavior of programs in SPIN. The result 
was a translator, called Java PathFinder, from Java to the 
modeling language PROMELA of SPIN. In addition, a 
tool was developed for abstracting Java programs to re- 
duce their state space, making model checking tractable. 

Then, during the actual RA experiment in 1999, a dead- 
lock occurred within less than 24 hours of operation. Al- 
though the problem was promptly identified and circum- 
vented by the DS-1 team, we took the challenge of try- 
ing to diagnose the error in a fast-response “clean room’’ 
experiment 1 . After isolating a suspicious part of the pro- 
gram by visual inspection, we modeled it hi Java, and then 
used Java PathFinder to exhibit a concurrency error that 
indeed turned out to be the one that had occurred in flight. 

One key observation of these two successive experi- 
ments is that the error that caused the deadlock is ex- 
actly isomorphic to one of those found using SPIN two 
years before in another part of the code. It is a concur- 
rency error, whose activation depends on a priori unlikely 
scheduling conditions between concurrent tasks. In fact, 
this error did not appear in over 300 hours of system-level 
testing on JPL’s flight system testbed. The conditions un- 
der which it occurred in flight were not anticipated during 
testing. A principal benefit of model checking technolo- 
gies is to be able to exhaustively cover scheduling alter- 
natives. This paper gives a compelling illustration of how 
model checking found an error that was a priori unlikely 
but did actually occur. It also discusses gaps between pre- 
vious formal method tools and requirements for making 
them easily accessible to system developers for ‘in the 
loop’ verification. Technological advances towards nar- 
rowing this gap are described in the context of the RA 
verification. 

Section 2 describes the RA experiment. Section 3 de- 
scribes the verification effort before flight, while Section 
4 describes the verification effort after flight. The sec- 
tion also presents Java PathFinder. Section 5 describes 
the Java abstraction tool, and finally, Section 6 contains a 
conclusion. 


1 By “clean room” we are referring to the fact that, while the verifica- 
don was post -facto, the team had no interaction with the actual debug- 
ging team. 


2 The Remote Agent Experiment 

To prepare for space exploration programs of the next 
decades within a reduced budget, NASA has set up the 
New Millennium program: a series of technology vali- 
dation flights whose objective is to accelerate the quali- 
fication for flight of new spacecraft technology. One of 
the objectives of the New Millennium program is to in- 
crease spacecraft autonomy, moving from the low-level 
control sequences currently in use towards mission-level 
planning and autonomous health monitoring and recov- 
ery. 

Deep Space 1 (DS-1), the first New Millennium Mis- 
sion, was launched from Cape Canaveral on October 24, 
1998 and ended its primary mission in September 1999 
(it is still operating and is on its way for a comet en- 
counter in 2001). During that mission, it successfully 
tested 12 cutting-edge technologies such as ion propul- 
sion, on-board optical navigation, and the Al-based Re- 
mote Agent, marking the first operational use of artificial 
intelligence during space flight. 

In its initial design, the RA Experiment (RAX) on DS- 
1 consisted of a short, limited 12-hour scenario designed 
to gain confidence in the RA, followed by a complete 
6-day scenario that was the full RA test. Later, the ex- 
periment had to be compressed into a single 2-day sce- 
nario, to accommodate external mission constraints. The 
original scenarios were designed to cover a formal list 
of validation objectives. To protect the main DS-1 mis- 
sion from possible misbehaviors of RA, the design in- 
cluded a “safety net” that allowed the RA experiment to 
be completely disabled with a single command, issued ei- 
ther from the ground or by on-board fault protection. 

The RA went through a thorough qualification process 
before being allowed to run orr DS-1. Though some for- 
mal verification tasks, such as the one reported here, were 
performed as feasibility studies, the formal qualification 
process relied on more conventional testing approaches. 
However, since the RA was a flight experiment, and not 
flight software, it was not subjected to the testing stan- 
dards of the latter. 

This section is a short summary of the flight qualifica- 
tion and experience of the RA [2, 13], 

2.1 Remote Agent 

The RA is an autonomous spacecraft controller developed 
by NASA Ames conjointly with the Jet Propulsion Labo- 
ratory (JPL) [12]. It comprises three components: 



• The Planner and Scheduler (PS) [11] generates flex- 
ible plans, specifying the basic activities that must 
take place. Given a mission goal, it produces se- 
quences of tasks for achieving this goal using avail- 
able system resources. 

• The Smart Executive (EXEC) [14] receives the plan 
from the Planner/Scheduler and then commands 
spacecraft systems to take the necessary actions to 
achieve and maintain the specified spacecraft states. 

• The Mode Identification and Recovery component 
(MIR), called Livingstone [16], monitors the state 
of the spacecraft, detects and diagnoses failures and 
suggests recovery actions to the Executive. 

The Executive subsystem is fire focal point of fire verifi- 
cation work discussed in this article. It combines features 
of multi-threaded operating systems with aspects of A I 
languages based on sub-goaling, such as Prolog. It is con- 
ceptually composed of three layers: a set of core sen-ices 
that implement a robust operating system for executing 
concurrent tasks, a set of engine modules including a plan 
runner, and a set of mission-specific task programs. The 
Executive schedules fire execution of concurrent tasks. It 
also monitors a set of properties associated with system 
resources, and takes recovery actions on property viola- 
tions. The Executive is written in a multi- threaded LISP, 
using a set of LISP macros called the Executive Sequenc- 
ing Language (ESL) developed at IPL. 

2.2 Testing the Remote Agent 

Because autonomous systems such as the RA need to re- 
spond robustly in a wide range of situations, verifying that 
they respond correctly in all situations would require a 
huge number of test cases. Furthermore, these tests ide- 
ally have to be run on high-fidelity testbeds that are highly 
oversubscribed, hard to configure, and, running at real 
time speeds, take hours or days for a single run. 

To address these problems, the RAX team followed a 
“baseline testing” approach, starting from nominal sce- 
narios and testing a number of nominal and off-nominal 
variations around these scenarios. A wide range of varia- 
tions were run on more available and faster low-fidelity 
testbeds, leading to the identification and resolution of 
100-200 bugs during 18 months. An automated test- 
ing tool was designed for this purpose. Some of the 


most likely off-nominal variants were run on medium- 
fidelity testbeds, while only nominal scenarios and cer- 
tain performance and timing related tests were performed 
on high-fidelity testbeds. The final stage was a pair of 
“dress rehearsal” operational readiness tests (ORTs), in- 
volving actual communication with die mission control 
center. The bulk of file problems identified during testing 
were found with the low-fidelity testbeds. The ORTs only 
identified minor shortcomings that were resolved prior to 
flight. 

2.3 Remote Agent in Flight 

On Monday, May 17th, 1999, 1 1 :04 am PDT, a telemetry 
packet confirmed that die RA had taken control of DS- 
1. The scenario went on smoothly, achieving 70% of die 
objectives, until Tuesday 7 :00 am, when it became appar- 
ent that a command had not been executed as expected 
by file RA. The RA Executive was blocked, although die 
rest of the RA and file spacecraft were otherwise healthy. 
The Executive’s low-level commands were used to gather 
a maximum of information, and then file experiment was 
interrupted. 

By late Tuesday afternoon, the RAX team had found 
file source of the problem in the Executive code. They 
designed a 6-hour scenario that was run on Friday morn- 
ing and went successfully through the remaining 30% of 
the objectives. A patch was also generated, but the DS-1 
mission decided not to uplink it, considering the insuffi- 
cient testing of the patch and the very low probability of 
file problem recurring. 

The blocking was due to a missing critical section 
that had lead to a race condition between two concurrent 
threads. Under some very precise and unlikely timing cir- 
cumstances, both threads could end up in a deadlock con- 
dition in which each one was waiting for an event that 
only the other one could provide, which is exactly what 
happened in flight. 

3 Formal Analysis Before Flight 

hi April-May 1997 we analyzed pail of the RA Executive 
using the SPIN model checker [7]. This effort lead to the 
discovery of five errors in the LISP code which are de- 
scribed below. As discussed in Section 4.3, one of these 
errors is isomorphic to the error that actually occurred 
during flight, causing a deadlock. First we give a short de- 
scription of SPIN and its modeling language PROMELA. 



Then we explain how a PROMELA model was extracted 
from the LISP code, and how properties were stated and 
verified in the model, leading to the discovery of the five 
errors. We conclude with a discussion of the methodology 
that has been followed. 

3.1 The SPIN Model Checker 

SPIN [10] is a tool for analyzing the correctness of fi- 
nite state concurrent systems with respect to formally 
stated properties. A concurrent system is modeled in 
file PROMELA modeling language, and properties to be 
verified are formalized as assertions hi the program or 
as formulae in file temporal logic LTL ( Linear Temporal 
Logic). SPIN provides a model checker, which automat- 
ically examines all program behaviors in order to decide 
whether the PROMELA program satisfies the stated prop- 
erties. In case a property is not satisfied, an error trace 
is generated, which illustrates the sequence of executed 
statements from file initial state to file state that violates 
file property. These error traces can then be executed in 
a simulator. The set of states reachable from the initial 
state must be finite hi case a property needs to be proven 
correct for the whole state space. 

A PROMELA program consists of a set of sequential 
processes that communicate via message passing through 
bounded buffered channels and via shared variables. Pro- 
cesses can be created dynamically. The behavior of an 
individual process is described using the statement lan- 
guage which provides many standard constructs such as 
variable assignments, channel communications, loops, 
conditionals, and sequential composition. Variables are 
typed, where a type can either be primitive, such as in- 
teger, or composite in the form of arrays and records. 
PROMELA provides inline procedures, which is a lim- 
ited notion of procedural abstraction that is implemented 
via macro expansion. 

Each process represents a finite automaton, and the 
global behavior of the system is then obtained by comput- 
ing on-the-fly an asynchronous interleaving product of all 
these automata, creating the global state space. To per- 
form model checking, SPIN translates (the negation of) 
any LTL formula into a Biichi automaton, and computes 
the synchronous product of tins and the global state space. 
The result is again a Biichi automaton. If the language of 
this automaton is empty it means that the formula is sat- 
isfied. SPIN searches the state space depth-first, creating 
the states on-the-fly. A partial-order reduction technique 


is used to prune the set of transitions to be explored. 

3.2 Creating a PROMELA Model 

The modeling activity focused on the core services of the 
plan execution module. The RA Executive core is de- 
signed to support execution of software-controlled tasks 
on board die spacecraft. A task often requires specific 
properties to hold during its execution. When a task is 
started, it first tries to achieve the properties on which it 
depends, after which it starts performing its main func- 
tion. Several tasks may try to achieve conflicting proper- 
ties; for example, one task may try to turn on a camera 
while another task tries to turn it off. To prevent such 
conflicts, a task has to lock in a lock table any property 
it wants to achieve. Once, a property is locked, it can be 
achieved by file task locking the property. 

Properties may, however, be unexpectedly broken 
while tasks depending on them are executing. A property 
is defined as broken when it is locked in die lock table by 
some task, has been achieved (an extra boolean field in 
file lock table), but for some reason fails to hold on board 
file spacecraft. For the purpose of detecting which prop- 
erties hold on board, a database is maintained of all prop- 
erties being true at any time. Hence, an inconsistency can 
be detected by relating the lock table with the database. 
Tasks depending on a broken property must be interrupted 
and informed about the anomaly. For this purpose, a dae- 
mon monitors the changes on board the spacecraft, and in 
particular the consistency between the lock table and file 
database. The daemon is normally asleep, but is awak- 
ened whenever there is a change in the lock table or the 
database, where upon it checks their consistency. 

The PROMELA model focuses on operations on file 
lock table. Hence, it is an abstraction of the LISP pro- 
gram, omitting details irrelevant for the lock table opera- 
tions. The LISP program is approximately 3000 lines of 
code while the PROMELA model is 500 lines of code. 
Furthermore, the model only deals with a limited number 
of tasks and properties in order to limit the search space 
the SPIN model checker has to explore. Most abstrac- 
tions were made in an informal manner without any for- 
mal proofs showing that bugs are maintained. Hence, in 
the abstraction phase we may have left out errors in the 
LISP code. However, all the errors we found in the model 
were also errors in the LISP code. 

To give an idea of the modeling, we show how file dae- 
mon was translated, since it was the daemon that con- 



(defun daemon () 

( loop 

(if (check- locks) 

(do-automatic-recovery) ) 

(unless 

( changed? 

(+ (event -count * database- event* ) 
(event-count * lock- event*) ) ) 

( wait - for- event s 

(list * database -event* 

* lock- event * ) ) ) ) ) 

Figure 1: Daemon in LISP 


tained the error pattern which also occurred during flight, 
and which was found using the model checker. The actual 
LISP code describing the behavior of the daemon is given 
in Figure 1. 

The daemon goes through a loop, where in each itera- 
tion it checks the lock table, comparing it to the database, 
and recovers any inconsistencies that may be detected (if 
the check-locks function returns true). After that, it 
goes to sleep by calling the wait-for-events func- 
tion, which as parameters takes a list of events to wait 
for. Whenever one of these events is signaled, i.e. the 
database or the lock table is modified, the daemon will 
wake up and continue. 

In order to catch events that occur while the daemon is 
executing, each event has an associated event counter that 
is increased whenever the event is signaled. The daemon 
only calls wait - f or - event s in case these counters have 
not changed, hence, there have been no new events since 
it was last restarted from a call of wait-for-events. 

The PROMELA model of this LISP code is presented 
in Figure 2. The if-construct decides whether the daemon 
should stop and wait for a new database event or lock 
event to occur (call of wait.for_events), or whether 
it should continue for another iteration. Another itera- 
tion is needed if a database event or a lock event has oc- 
curred since the daemon was restarted last time; that is, in 
case the event counter event_count differs from the sum 
of the event counters for the database and lock events. 
If there is a difference, it means that there has been an 
event since the last tune event_count was updated, and 
the daemon must perform another iteration before calling 
wait_f or.events, first updating event_count to hold 
the new event counter sum. 


proctype daemon (Taskld this) { 
byte event_count = 0; 
do 

: : check_locks_and_recover ; 
if 

:: (Ev [DATABASE_EVENT] .count + 

Ev [LOCK_EVENT] .count 
== event_count ) 

- > 

wait_f or_events (this, 

DATABASE_EVENT , LOCK_EVENT ) 

: : else - > 

event_count = 

Ev [DATABASE_EVENT] .count + 

Ev [LOCK_EVENT] .count 
fi 
od 

}<• 

Figure 2: Daemon in PROMELA 

3.3 Stating and Verifying Properties 

The model was analyzed with respect to the following 
two properties, here expressed informally. The release 
property reads: “A task releases all of its locks before it 
terminates”. The abort property reads: “If an inconsis- 
tency occurs between the database and an entry in the 
lock table, then all tasks that rely on the lock will be ter- 
minated, either by themselves or by the daemon in terms 
of an abort” . The release property was formulated by in- 
serting an assertion in the code at the end of each task. 
This assertion stated that all locks should be released at 
this point. The second property was stated as a linear tem- 
poral logic property of the form: 

[] (propertyJbroken -> otasks.inf ormed) 

This property says: whenever a property is broken, 

then eventually all tasks depending on this property will 
be informed about it (in fact terminated). The names 
property Joroken and tasks_inf ormed are macro 
names standing for predicates on the state space. 

The attempted verification of the two properties led to 
the direct discovery of five programming errors - one 
breaking the release property, three breaking the abort 
property, and one being a non-serious efficiency problem 
where code was executed twice instead of once. The first 
four of these errors are classical concurrency errors hi the 
sense that they arise due to processes interleaving in un- 
expected ways. 



The error we want to focus oil in this presentation is the 
one isomorphic to the RAX anomaly. The error caused 
the abort property to be violated. The error trace gener- 
ated by SPIN demonstrated the following situation. The 
daemon is prompted to perform a check of the lock table. 
It finds everything consistent and checks the event coun- 
ters to see whether there have been any new events while 
it has been running. This is not the case, and the daemon 
therefore decides to call wait-f or-events. However, 
at tlais point an inconsistency is introduced, and a signal 
is sent by the environment, causing the event counter for 
the database event to be increased. This is not detected 
by the daemon since it has already made the decision to 
wait, which it then does, and the inconsistency now is not 
discovered by the daemon. Our suggested solution at the 
time was to enclose the test and the wait within a critical 
section, which does not allow scheduling interrupts to oc- 
cur between the test and the wait. Furthermore, two other 
flawed code fragments violated the abort property. 

The release property was violated in the sense that 
locks did not always get released by a task. The error trace 
generated by SPIN demonstrated that during a task’s re- 
lease of a lock, but before its actual release, the task may 
get interrupted by the daemon if the property gets broken. 
This means that the task terminates without releasing the 
lock. The error is particularly nasty in the sense that all 
code, except the lock releasing itself, had been protected 
against this situation: in case of an interrupt the lock re- 
leasing would be executed. 

The model was verified exhaustively using SPIN’S 
partial order reduction algorithm and state compression. 
Typically between 3, 000 - 200, 000 states were explored 
in the different models, using between 2-7 Mb of mem- 
ory, and using between 0.5 - 20 seconds. 

3.4 Discussion of Methodology 

The verification effort has been regarded by all involved 
parties as a very successful application of model check- 
ing, and of SPIN in particular. According to the RA pro- 
gramming team, the effort has had a major impact, lo- 
cating errors that would probably not have been located 
otherwise, and identifying a major design flaw. 

The modeling effort, i.e. obtaining a PROMELA 
model from the LISP program, took about 12 man weeks 
during 6 calendar weeks, while the verification effort took 
about one week. The modeling effort consisted concep- 
tually of an abstraction activity combined with a trans- 


lation activity. Abstraction was needed to cut down the 
program to one with a reasonably small finite state space, 
making model checking tractable. Translation, from LISP 
to PROMELA, was needed to obtain a PROMELA model 
that the SPIN model checker could analyze. 

The abstraction was done without any knowledge about 
the properties to be verified, since these were stated later. 
The abstraction maintained important operations on the 
lock table and ignored most other details of the orig- 
inal LISP program, hence, a kind of program slicing. 
No formal attempt was made to show that the abstrac- 
tions preserved errors. It is interesting that such an ad 
hoc approach still was extremely effective. The transla- 
tion phase was non-trivial and time consuming due to the 
relative expressive power of LISP when compared with 
PROMELA. 

Based on these observations, two research efforts were 
initiated that should make application of model checking 
within the software development cycle less resource de- 
manding. In one effort a translator from the Java pro- 
gramming language to PROMELA has been developed; 
see Section 4.2. In another effort, an abstraction tool 
has been developed, which can perform so-called predi- 
cate abstractions on Java programs; see Section 5. Both 
tools have been applied in the verification of the RA as 
described in the following. 

4 Formal Analysis After Flight 

Shortly after the anomaly occurred during the Remote 
Agent Experiment, on Tuesday May 18, the ASE team 
at NASA Ames heard that something had broken down 
in the RA while it was in control of the spacecraft and 
offered their help to the RAX team. On Lriday morning, 
after a few email exchanges, the RAX team provided ac- 
cess to the source code of the Executive, without identi- 
fying where the error was, and offered the ASE group the 
challenge of seeing “how long it would take for formal 
methods to come up with it”. 

On Friday afternoon, we decided to tun a “clean room” 
experiment to determine whether or not the technology 
currently used and under development in the group could 
have discovered the bug before it actually happened. At 
that time, we knew that debugging information collected 
from the spacecraft had enabled the DS-1 team to identify 
the bug and continue the experiment, and that the failure 
had something to do with a “handshaking" communica- 
tion between a Planner process and an Executive process. 



Other than these messages we had no further information, 
and no one in the ASE group had any contact with RAX 
personnel dining that week. 

This section first describes how the experiment was 
conducted. Then the Java PathFinder translator that was 
used to model check the flawed code is described. This 
is followed by a description of the error and how it was 
found using Java PathFinder. We conclude with a discus- 
sion of file methodology that has been followed. 

4.1 The Clean Room Experiment 

To make this clean room experiment credible, we de- 
cided that we would need to complete this exercise over 
file weekend, prior to the return of the RAX team from 
file DS-1 mission control at JPL the following Monday. 
This was both to avoid undue influence by people fa- 
miliar with file details of the bug, and also to meet the 
“short-tuniaround” challenge, mimicking what would be 
required if we were actually called on to provide “on-line” 
assistance. 

The experiment was set up as follows. A front-end 
group would try to spot the error by human inspection, 
or at least identify problematic parts of the code. On the 
basis of that, it would extract a more or less self contained 
portion of the code containing the problematic code por- 
tions, of a tractable size for a model checker. This ex- 
tracted code would then be handed over to the back-end 
group without any hints as to what could be the error. The 
back-end group would then try to locate the error using 
model checking. The situation was comparable to some- 
one doing visual inspection of code, and finding suspect 
sections which he wanted to explore further. 

The front-end team began perusing the code on Fri- 
day afternoon, and extracted roughly 700 lines containing 
questionable code 2 . The full group met again on Satur- 
day afternoon, and the front-end team gave the back-end 
team the extracted code. In accordance with the design of 
the experiment, they did not tell where the suspected bug 
was, but they briefed the back-end team on the control and 
data structures of the extracted code. The back-end group 
spent most of the time understanding that code in order to 
model it, and on Sunday morning came out with a fairly 
abstract model of the suspicious code. That model was 
written hi Java and verified with the Java model checker 
Java PathFinder, as described below. It reported a dead- 

2 Though they were not sure that they had indeed captured the con- 
currency error. 


lock, which turned out to be the one that had happened in 
flight five days before. 

4.2 The JPF Translator 

Java PathFinder (JPF) [8, 6] is a translator from a non- 
trivial subset of Java to PROMELA. Given a Java pro- 
gram, JPF translates this into a PROMELA program, 
which then can be model checked using SPIN. Java is an 
object-oriented programming language with a built-in no- 
tion of threads. Objects are instantiated dynamically from 
classes, which can be defined using single class inheri- 
tance. Threads, which are special objects with an activity, 
can communicate by making calls to methods defined in 
shared objects. Such methods can be defined as synchro- 
nized, thereby turning these shared objects into monitors, 
allowing only one thread to operate in the object at a time. 

In tiie default mode, the SPIN model checker will find 
any deadlocks present in the Java program. Such dead- 
locks can occur when several threads compete for access 
to tiie monitors. Properties can also be formulated explic- 
itly by the user, either as assertions in the program, or as 
linear temporal logic formulae. That is, a Java program 
can be annotated with assertions written as calls to a spe- 
cial assert method which takes a boolean argument ex- 
pression over the variables in the Java program. Any such 
call is translated into a corresponding PROMELA asser- 
tion, which will then be checked during tiie state space 
exploration whenever reached. Finally, SPIN’S own lin- 
ear temporal logic can be used to formulate properties 
over the Java program’s static variables (a static variable 
in Java is defined within a class, but is only allocated once, 
and hence is shared between all objects of the class). 

A significant subset of Java is supported by JPF: dy- 
namic creation of objects with data and methods, static 
variables and static methods, class inheritance, threads 
and synchronization primitives for modeling monitors 
(synchronized statements, and the wait and notify 
methods), exceptions, thread interrupts, and most of tiie 
standard programming language constructs such as as- 
signment statements, conditional statements and loops. 

The translator is written in 6000 lines of LISP, and was 
developed over a period of 8 months. JPF has been ap- 
plied to a number of case studies, amongst them a 1500 
line game server [9], a NASA file transfer protocol for 
satellites, and a NASA data transmission protocol for tiie 
space shuttle ground control. 

A related attempt to provide model checking technol- 



ogy for Java is described by Demartini et. al. [5], which 
also translates Java programs into PROMELA. However, 
their approach does not handle exceptions or polymor- 
phism as does Java PathFinder. In another related ap- 
proach, Corbett [4] describes a theory of translating Java 
to a transition model, making use of static pointer analy- 
sis to aid virtual coarsening, which reduces the size of the 
model. 


4.3 The RAX Error 

The suspected and eventually confirmed error was a miss- 
ing critical section around a conditional wait on an event. 
The relevant piece of code (anonymized for confidential- 
ity purposes) is shown in Figure 3. 

(loop 

(when 

(*1*) (or (/= count (esl :: event- count eventl) ) 
(*2*) (warp- safe (wait- for- event eventl) ) ) 

(setf count (esl :: event -count eventl)) 

(*3*) ( signal- event event2) ) ) 

Figure 3: The RAX Error in LISP 

This is the body of one of the concurrent tasks and con- 
sists of a loop. The loop starts with a when statement 
whose condition is a sequential-or statement 3 that states: 
if the event counter has not been changed (*1 *), then 
wait (*2*), else proceed. This behavior is supposed to 
avoid waiting on the event queue if events were received 
while the process was active. However, if the event oc- 
curs between (*1*) and (*2*), it is missed and the pro- 
cess goes asleep. Because the other process that produces 
those events is itself activated by events created by this 
one in (* 3 *), both end up waiting for each other, a dead- 
lock situation. 

This follows a similar pattern to the code shown in Fig- 
ure 1 that had been identified as a source of error during 
the verification of the Executive in 1997, as described in 
Section 3.3. This similarity was spotted by members of 
both the front-end and back-end teams, and contributed 
greatly to narrowing down the verification effort to this 
particular potential problem. 


3 (or X Y) is evaluated like if X then true else Y. 


4.4 Demonstrating the Error with JPF 

The modeling focused on the code under suspicion for 
containing the error. The major two components to be 
modeled were events and tasks, as illustrated in Figure 4. 
The figure shows a Java class Event from which event 
objects can be instantiated. The class has a local counter 
variable and two synchronized methods, one for waiting 
on the event and one for signaling the event, releasing all 
threads having called wait.for.event. Note how the 
counter is incremented by signal .event in order to al- 
low the tasks to check whether new events have arrived. 
The increment is modulo 3 in order to reduce the state 
space to be searched by the model checker. This is an in- 
formal abstraction in the sense that it has not been proven 
to preserve errors. Section 5 explains how an alternative 
counter abstraction for this program can be made and au- 
tomatically proved correct. 

class Event) 
int count = 0 ; 

public synchronized void wait_f or_event ( ) { 
try{wait() ; jcatch { InterruptedException e){}; 

} 

public synchronized void signal_event ( ) { 
count = (count +1) % 3; 

notifyAll ( ) ; 

} 

} 

class FirstTask extends Thread{ 

Event eventl, event 2 ; 
int count = 0 ; 

public void run ( ) { 

count = eventl . count ; 
while (true) { 

if (count == eventl . count ) 
eventl . wait_f or_event ( ) ; 
count = eventl . count ; 
event2 . signal_event ( ) ; 

} 

} 

} 


Figure 4: The RAX Error hi Java 

Figure 4 also shows the definition of one of the tasks. 
This is an abstraction (in Java) of the LISP code pre- 
sented in Figure 3. The task’s activity is defined in the 
run method of the class FirstTask, which itself ex- 



tends the Thread class, a built-in Java class that sup- 
ports thread primitives. The body of the run method 
contains an infinite loop, where in each iteration a con- 
ditional call of wait_f or_event is executed. The con- 
dition is that no new events have arrived, hence the event 
counter is unchanged. After having applied JPF, the SPIN 
model checker revealed the deadlock situation described 
in Section 4.3. In the Java context a new event arrived af- 
terthetest (count == event 1 . count ), but before the 
call event 1 . wait_f or_event ( ) . 

4.5 Discussion of Methodology 

The formal analysis of the Executive after the occurrence 
of file anomaly was preceded by a code inspection, which 
identified the possible source of the error. Some of us 
spotted the potential error situation because it resembled 
the similar error we had found using SPIN in 1997, as de- 
scribed in Section 3.3. Due to the focus on the particular 
code fragment, it was relatively easy to perform the ab- 
straction needed to extract a Java program with a small 
finite state space. This took about two hours. However, 
the suspicion was only a suspicion, and a demonstration 
that the code was flawed was provided using JPF. This 
showed the usefulness of using a model checker to an- 
swer focused queries. 

Since the original source code was in LISP, we still 
had to translate it by hand in Java, which goes against 
JPF’s intended puipose. To avoid that, one would need 
an abstraction tool and a translator for LISP. Since LISP’s 
future within NASA is questionable we have focused on 
providing these technologies for Java. Java is a very con- 
venient modeling language, providing most of the high 
level features of the powerful Common LISP Object Sys- 
tem (CLOS), such as dynamically created objects with 
methods and data. The major experience with all ex- 
periments done with JPF are obviously that a non-trivial 
amount of abstraction is needed in order to reduce the size 
of a program’s state space. This problem is addressed in 
Section 5. 

5 An Abstraction Tool for Java 

As a part of the JPF project, we have been developing 
an automated abstraction tool which converts a Java pro- 
gram to an abstract program with respect to user-specified 
abstraction criteria. The user can specify abstractions by 
removing variables in the concrete program and/or adding 


new variables (currently the tool supports adding boolean 
types only ) to the abstract program. Given a Java pro- 
gram and such abstraction criteria, the tool generates an 
abstract Java program in terms of the new abstract vari- 
ables and unremoved concrete variables. To compute the 
conversion automatically, we use a decision procedure, 
SVC (Stanford Validity Checker), which checks the va- 
lidity of logical expressions [1], 

The abstraction tool is designed to deal with object- 
oriented programs. The user can specify abstraction cri- 
teria for each class by removing field variables hi the class 
and/or adding new abstract variables to the class. There- 
fore, it can be used to abstract subcomponents in a pro- 
gram when file whole program is too complicated to ap- 
ply abstraction globally, hi addition, the user can specify 
new abstract variables which depend on variables from 
two different classes (inter-class abstraction). 

There has been similar work by others [3, 15], all of 
which require use of only global variables to describe 
a system in simple languages similar to guarded com- 
mands. However, our tool targets a real programming lan- 
guage Java and is able to deal with many problems caused 
by its object-orientation. 

5.1 Application of the Tool to the RA 

As we do not have enough space in this paper for a de- 
tailed explanation of the abstraction algoritlnn, let us il- 
lustrate the abstraction performed by the abstraction tool 
on a pail of the RA Java code shown in Figure 4. As 
stated before, state explosion occurs because of the un- 
bounded increase of the count variable in the Event class 
(in the original LISP code) and the assignment of the 
count variable in the FirstTask class (as well as in 
file SecondTask class which is not shown). Therefore, 
we use abstraction to remove those count variables by 
specifying Abstract . remove ( count ) in the classes of 
Event and FirstTask. hi place of these variables, we 
add new abstraction predicates which appear in file pro- 
gram with the count variables. For instance, we put 
Abstract . addBoolean ( "FcntEqEcnt " , 
count==eventl . count) in the definition of the 
FirstTask class to specify an abstraction predicate: 
FirstTask . count is equal to Event. count (For im- 
plementation convenience, object names are used to re- 
fer to class types.). We also used more filter-class ab- 
stractions such as FcntGeEcnt (FirstTask . count is 
greater than or equal to Event . count), ScntEqEcnt 



(SecondTask . count is equal to Event . count), etc. 

This is an example of an inter-class abstraction. 
Dealing with such inter-class abstractions is more in- 
volved than dealing with the abstractions inside one 
class. For each inter-class abstraction, the tool gener- 
ates an additional class definition in the abstract pro- 
gram, winch contains new boolean variables correspond- 
ing to the specified predicate. The boolean variables 
in the new class are defined as a two-dimensional ar- 
ray where each index refers to an object in either of 
the two classes. In Figure 5, the new abstract variable 
FcntEqEcnt .pred [Fobj ] [Eobj] corresponds to the 
user-defined predicate FcntEqEcnt for an object Fobj 
of FirstTask class and an object Eobj of Event class, 
i.e., Fobj . count = Eobj . count. 

Given the abstraction criteria, we now need to compute 
file value of the abstract variables in the abstract program 
so that they are consistent with the values of concrete vari- 
ables in the program. Figure 5 shows how the abstraction 
tool converts the assignment statement, count = count 
+ 1 (without the modulo operation) in Figure 4. Fust, 
the concrete assignment statement is omitted in the ab- 
stract program because the variable to be assigned has 
been removed. Instead, the tool checks which of the new 
abstract variables are possibly affected by this assign- 
ment and generates corresponding assignments to those 
abstract variables. For the example statement, a set of 
boolean variables that refers to ‘this’ Event object will 
be affected: FcntEqEcnt . pred [i] [this] in Figure 5 
(Actually, we use functions that return the corresponding 
index of a given object). To update those abstract vari- 
ables, a for-statement is used. For each of the abstract 
variables, the pre-images that leads the abstract variable 
to be true (or false) by the assignment are computed. 
Then the pre-images are mapped into the abstract domain 
by checking validity of the corresponding logical expres- 
sions. Finally, the results are used as a guard condition 
to set the abstract variables to true (or false). In the ex- 
ample, the variable FcntEqEcnt .pred [i] [this] will 
be set to false if it was true (or if some condition with 
another abstract variable holds). Otherwise, the variable 
is set to a non-deterministic boolean value. Because the 
concrete assignment statement is regarded as atomic, a set 
of these abstract assignments are declared as atomic for 
the IPF model checker. The additional statements for up- 
dating other abstract variables such as FcntGeEcnt are 
not shown in the figure. 


Verify .beginAtomic () ; 

/ / count = count + 1 ; 

for(int i = 0; i < FcntEqEcnt . numFirstTask; ++i) { 
if (FcntEqEcnt .pred [i] [FcntEqEcnt . getEvent (this) ] 

| FcntGeEcnt .pred [i] [FcntGeEcnt . getEvent (this) ] ) 
FcntEqEcnt .pred [i] [FcntEqEcnt .getEvent (this) ] = 
false ; 

else FcntEqEcnt .pred [i] [FcntEqEcnt . getEvent (this) ] 

= Verify . randomBool () 

} 

// similar code for updating other inter-class 
// abstract variables such as FcntGeEcnt, etc. 

Verify . endAtomic () ; 

Figure 5 : Output of the abstraction tool for the assignment 
statement 

5.2 Discussion of Methodology 

Using the tool, we have been able to obtain an abstract 
Java program of the RA code automatically. In the exam- 
ple, the unbounded integer variables are replaced by a set 
of boolean variables, hence the abstract program is free 
from the state explosion. Moreover, use of the tool helps 
to avoid error-prone abstractions based on human reason- 
ing. The tool generates a sound approximation of the 
concrete program using an automated validity checker, al- 
though it is not necessarily the most accurate one. 

However, the user must give reasonable abstraction cri- 
teria for the tool to generate a meaningful abstract pro- 
gram in order to check some desired properties. In case 
tlie abstraction criteria are not good enough, the result will 
be a too rough abstract program which can not preserve 
tlie properties to be checked. 

6 Conclusion 

This paper describes two major verification efforts carried 
out within the Automated Software Engineering Group 
at NASA Ames Research Center. The first effort con- 
sisted of analyzing part of the RA autonomous space craft 
software using the SPIN model checker. One of the er- 
rors found with SPIN, a missing critical section around a 
conditional wait statement, was in fact reintroduced in a 
different subsystem that was not verified in this first pre- 
flight effort. This error caused a real deadlock in tlie RA 
during flight in space. 

Such concurrency -related errors only happen as tlie re- 
sult of particular scheduling circumstances. Scheduling is 
totally uncontrolled when tests are run, and is highly sen- 



sitive to variations in the operating environment (e.g. op- 
erating system, other running tasks). This explains why 
the anomaly happened in flight, though it had not oc- 
curred even once in thousands of previous runs on the 
various ground testbeds. 

Developing the formal model of the program was, how- 
ever, a tune consuming task, requiring a manual trans- 
lation from the RA LISP code to the PROMELA lan- 
guage of the SPIN model checker. In addition, code de- 
tails had to be abstracted away in order to obtain a small 
enough finite state system that could be effectively model 
checked. The translation difficulty spawned the initiative 
to automate the translation from high level programming 
languages to modeling languages for formal verification, 
such as PROMELA. Java was chosen as the source lan- 
guage because of its modem programming language con- 
structs, such as support for object-oriented programming, 
and the standar dization across implementations of its con- 
currency constructs. An automatic translator from Java to 
PROMELA was designed and implemented, called Java 
PathFinder (JPF). With JPF one can model check smaller 
Java programs for assertion violations, deadlocks, and 
general linear temporal logic properties. The translator- 
covers a substantial subset of Java, illustrating the feasi- 
bility of the approach. 

In the second effort, JPF was used for modeling the 
RAX deadlock after it occurred. That is, after the front- 
end team isolated a reduced subset of the code that likely 
included the error, the back-end team developed a Java 
program which exposed the error. The translator trans- 
lated this into a PROMELA model, and the model check- 
ing of this model then immediately revealed the error. 
Java turned out to be an excellent choice as a modeling 
language, with a high level of abstraction, due to its object 
oriented features. In later work, a system that automates 
certain aspects of predicate abstraction was developed and 
successfully demonstrated on the same example. 

This experience gave a clear- demonstration that model 
checking can locate errors that are very hard to find with 
normal testing and can nevertheless compromise a sys- 
tem’s safety. It stands as one of the more successful ap- 
plications of formal methods to date. In its report of the 
RAX incident, the RAX team indeed acknowledges that 
it “provides a strong impetus for research on formal veri- 
fication of flight critical systems” [13], 

A posteriori, given the successful partial results, one 
can wonder why the first verification effort was not ex- 
tended to the rest of the Executive, which might have 


spotted the error before it occurred in flight. There are 
two reasons for that. First, the purpose of the effort was 
to evaluate the verification technology, not to validate the 
RA. The ASE team did not have the mission nor the re- 
sources needed for a full-scale modeling and verification 
effort. Second, the part of the code in which the error was 
found has been written after the end of the first verifica- 
tion experience. 

Regarding software verification, the work presented 
here demonstrates two main points. First of all, we be- 
lieve that it is worthwhile to do source code verification 
since code may contain serious errors that probably will 
not reveal themselves in a design. Hence, although design 
verification may have the economical benefit of catching 
errors early, code verification will always be needed to 
catch errors that have survived any good practice. Code 
will always by definition contain more details than the 
design - any such detail being a potential contributor to 
failure. 

Second, we believe that model checking source code is 
practical. The translation issue can be fully automated, 
as we have demonstrated. The remaining technical chal- 
lenge is scaling the technology to work with larger pro- 
grams - programs that could have very large state spaces 
unless suitably abstracted. Abstraction is of course a ma- 
jor obstacle, but our experience has been that this effort 
can be minimized given a set of supporting tools. 
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